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^ Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, Donald Yeung 
May 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 2 

Publisher: ACM Press 

Full text available: "jg pdf(2. 45 MB) Additional Information: full citation, abstract, references, index terms 

Pointer-chasing applications tend to traverse composite data structures consisting of 
multiple independent pointer chains. While the traversal of any single pointer chain leads 
to the serialization of memory operations, the traversal of independent pointer chains 
provides a source of memory parallelism. This article investigates exploiting such 
interchain memory parallelism for the purpose of memory latency tolerance, using a 
technique called multi-chain prefetching. Previous work ... 

Keywords: Data prefetching, memory parallelism, pointer-chasing code 
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Siddhartha Chatterjee 

July 1993 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 15 Issue 3 
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„ « ^H-^iim Additional Information: full citation , r eferences , citings, index terms, 

Full text available:™ pdf 4.1 7MB 

^ review 



Keywords: compilers, data parallelism, shared-memory multiprocessors 
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May 2002 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2002 Conference 
on Programming language design and implementation PLDI '02, volume 37 
Issue 5 
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Prefetching data ahead of use has the potential to tolerate the growing processor-memory 
performance gap by overlapping long latency memory accesses with useful computation. 
While sophisticated prefetching techniques have been automated for limited domains, 
such as scientific codes that access dense arrays in loop nests, a similar level of success 
has eluded general-purpose programs, especially pointer-chasing codes written in 
languages such as C and C++. We address this problem by describing ... 

Keywords: data reference profiling, dynamic optimization, dynamic profiling, memory 
performance optimization, prefetching, temporal profiling 



D etecti ng jdata races in Cilk programs that u s e locks 

Guang-Ien Cheng, Mingdong Feng, Charles E.Leiserson, Keith H. Randall, Andrew F. Stark 
June 1998 Proceedings of the tenth annual ACM symposium on Parallel algorithms 

and architectures SPAA 98 
Publisher: ACM Press 

Full text available: E g] pdf( 1 .81 MB ) Additional Information: full citation , references , citings, index terms 



Keywords: Cilk, algorithm, data race, debugging, multithreading, parallel programming, 
race detection 



Data ca che lockin g for higher program predictabilit y 
Xavier Vera, Bjorn Lisper, Jingling Xue 

June 2003 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
2003 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS '03, volume 3 1 issue l 

Publisher: ACM Press 

r t ., ul « ^r/onon, Additional Information: full citation, abstract, references, citings, index 
Full text available:^ pdf(292,01_ KB) terms 

Caches have become increasingly important with the widening gap between main memory 
and processor speeds. However, they are a source of unpredictability due to their 
characteristics, resulting in programs behaving in a different way than expected. Cache 
locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is 
a solution that trades performance for predictability: at a cost of generally lower 
performance, the time of accessing the memory becomes predictable.This pape ... 

Keywords: data cache analysis, worst-case execution time 



G P G PU: general pur pose computation on gra phics hardware 

David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff 

Woolley, Aaron Lefohn 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 
Publisher: ACM Press 

Full text available: ^jdf(63J)3JWB) Additional Information: full citation, abstract, citings 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 
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Cryptograph y and data securit y 
Dorothy Elizabeth Robling Denning 
January 1982 Book 

Publisher: AddisorbWesley Longman Publishing Co., Inc. 

_ ii ., . . .„„ n An ., D v Additional Information: full citation, abstract, references, citings, index 

Full text available: g pdf( 1 9.47 MB ) — — 

From the Preface (See Front Matter for full Preface) 

Electronic computers have evolved from exiguous experimental enterprises in the 1940s 
to prolific practical data processing systems in the 1980s. As we have come to rely on 
these systems to process and store data, we have also come to wonder about their ability 
to protect valuable data. 

Data security is the science and study of methods of protecting data in computer and 
communication systems from unauthorized disclosure ... 

An. Automatic Techniq ue for Selection of Data Re presentations i n S^L_Prggrams 
Edmond Schonberg, Jacob T. Schwartz, Micha Sharir 

April 1981 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 3 Issue 2 
Publisher: ACM Press 

Full text available: 'BP pdf( 1.22 MB ) Additional Information: full citation , reference s, citings, index terms 



The architecture of concurrent pro g rams 
Per Brinch Hansen 
January 1977 Book 

Publisher: Prentice-Hall, Inc. 

_ |. , M M4f% Ti ■ >n\ Additional Information: full citation , abstract , referen ces, citings, index 

Full text available:™ pdf( 10.71 MB ) 



terms 

From the Preface 



CONCURRENT PROGRAMMING 

This book describes a method for writing concurrent computer programs of high quality. It 
is written for professional programmers and students who are faced with the complicated 
task of building reliable computer operating systems or real-time control programs. 

The motivations for mastering concurrent programming are both economic and 
intellectual. Concurrent programming makes it possible to use a compu ... 

10 Accelerator: using data parallelism to prog ram GPUs for general-purpose .uses H 
David Tarditi, Sidd Puri, Jose Oglesby 

October 2006 ACM SIGARCH Computer Architecture News , ACM SIGOPS Operating 
Systems Review , ACM SIGPLAN Notices , Proceedings of the 12th 
international conference on Architectural support for programming 
languages and operating systems ASPLOS-XII, volume 34 , 40 , 4i issue 5,5, 
n 

Publisher: ACM Press 

Full text available: t g] pdf(266.52 KB) Additional Information: full citation, abstract, inferences, in dex te rms 

GPUs are difficult to program for general-purpose uses. Programmers can either learn 
graphics APIs and convert their applications to use graphics pipeline operations or they 
can use stream programming abstractions of GPUs. We describe Accelerator, a system 
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that uses data parallelism to program GPUs for general-purpose uses instead. 
Programmers use a conventional imperative programming language and a library that 
provides only high-level data-parallel operations. No aspects of GPUs are exposed to ... 

Keywords: data parallelism, graphics processing units, just-in time compilation 



11 Parall el execution of prolo g pro g rams: a surve y 

>Al Gopal Gupta, Enrico Pontelli, Khayri A.M. AN, Mats Carlsson, Manuel V. Hermenegildo 
>^ July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 23 Issue 4 

Publisher: ACM Press 

^ tl 4 ., u. « -I*/, n C m. D \ Additional Information: full citation, abstract, references, citings, index 
Full text available: pdf( 1.95 MB ) 

~ ' terms 

Since the early days of logic programming, researchers in the field realized the potential 
for exploitation of parallelism present in the execution of logic programs. Their high-level 
nature, the presence of nondeterminism, and their referential transparency, among other 
characteristics, make logic programs interesting candidates for obtaining speedups 
through parallel execution. At the same time, the fact that the typical applications of logic 
programming frequently involve irregular computatio ... 

Keywords: Automatic parallelization, constraint programming, logic programming, 
parallelism, prolog 



12 Assembly Jnsiruction^leyeJ reverse execution for debugging 
Tankut Akgul, Vincent J. Mooney III 

April 2004 ACM Transactions on Software Engineering and Methodology (TOSEM), 

Volume 13 Issue 2 
Publisher: ACM Press 

Full text available: *g) pdf(1.18 MB ) Additional Information: full citation, abstract, rejejrences, index .terms 

Assembly instruction level reverse execution provides a programmer with the ability to 
return a program to a previous state in its execution history via execution of a "reverse 
program." The ability to execute a program in reverse is advantageous for shortening 
software development time. Conventional techniques for recovering a state rely on saving 
the state into a record before the state is destroyed. However, state-saving causes 
significant memory and time overheads during forward execution.Th ... 

Keywords: Debugging, reverse code generation, reverse execution 




1 3 Streamlining data cache access with test address calculation 
^ Todd M. Austin, Dionisios N. Pnevmatikatos, GurindarS. Sohi 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture ISCA '95, volume 

23 Issue 2 
Publisher: ACM Press 

_ „ , , , ^ _ co Additional Information: full citation, abstract, references, citings, index 

Full text available: m BdfllSL MB) ^ 



terms 



For many programs, especially integer codes, untolerated load instruction latencies 
account for a significant portion of total execution time. In this paper, we present the 
design and evaluation of a fast address generation mechanism capable of eliminating the 
delays caused by effective address calculation for many loads and stores. Our approach 
works by predicting early in the pipeline (part of) the effective address of a memory 
access and using this predicted address to speculatively access the ... 
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1 4 Session S9.2: e mbedded progra ms: Ensurin g code safet y without runtime checks for 

rea l-time c ontro l s ystems 
Sumant Kowshik, Dinakar Dhurjati, Vikram Adve 

October 2002 Proceedings of the 2002 international conference on Compilers, 
architecture, and synthesis for embedded systems CASES '02 

Publisher: ACM Press 

- 114 . , w a , (/n7inm Additional Information: full citation, abstract, references, citings, index 
Full text available: TaJ p.df(1 27.10 KB) 

^ terms 

This paper considers the problem of providing safe programming support and enabling 
secure online software upgrades for control software in real-time control systems. In such 
systems, offline techniques for ensuring code safety are greatly preferable to online 
techniques. We propose a language called Control-C that is essentially a subset of C, but 
with key restrictions designed to ensure that memory safety of code can be verified 
entirely by static checking, under certain system assumpti ... 

Keywords: compiler, control, programming language, real-time, security, static analysis 



15 Writing efficient prog rams 
Jon Louis Bentley 

January 1982 Book 

Publisher: Prentice-Hall, Inc. 

Additional Information: full citation , abstrac t, references , citings, index term s 

The primary task of software engineers is the cost-effective development of maintainable 
and useful software. There are many secondary problems lurking in that definition. One 
such problem arises from the term "useful": to be useful in the application at hand, 
software must often be efficient (that is, use little time or space). The problem we will 
consider in this book is building efficient software systems. 

There are a number of levels at which we may confront the problem of efficien ... 

16 Third Generation Computer Systems 
A Peter J. Denning 

December 1971 ACM Computing Surveys (CSUR), volume 3 issue 4 

Publisher: ACM Press 

., ui f» ^oim^m Additional Information:^ 
Full text available: ^pdf(3. 52 MB) terms 

The common features of third generation operating systems are surveyed from a general 
view, with emphasis on the common abstractions that constitute at least the basis for a 
"theory" of operating systems. Properties of specific systems are not discussed except 
where examples are useful. The technical aspects of issues and concepts are stressed, the 
nontechnical aspects mentioned only briefly. A perfunctory knowledge of third generation 
systems is presumed. 

17 Secure prog ram execution via dynamic information flow trackin g 
G. Edward Suh, Jae W. Lee, David Zhang, Srinivas Devadas 

October 2004 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , ACM 
SIGARCH Computer Architecture News , Proceedings of the 11th 
international conference on Architectural support for programming 
languages and operating systems ASPLOS-XI, volume 39 , 38 , 32 issue u , 5 , 5 
Publisher: ACM Press 

_ Additional Information: full citation, abstract, references, citings, index 

Full text available: 
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We present a simple architectural mechanism called dynamic information flow tracking 
that can significantly improve the security of computing systems with negligible 
performance overhead. Dynamic information flow tracking protects programs against 
malicious software attacks by identifying spurious information flows from untrusted I/O 
and restricting the usage of the spurious information. Every security attack to take control 
of a program needs to transfer the program's control to malevolent code. ... 

Keywords: buffer overflow, format string, hardware tagging 



18 Rando mized instruction set emulation ! 
Elena Gabriela Barrantes, David H. Ackley, Stephanie Forrest, Darko Stefanovic 
February 2005 ACM Transactions on Information and System Security (TISSEC), volume 

8 Issue 1 
Publisher: ACM Press 

r— 1 1 .. . . ^ Ati^A a a i/d\ Additional Information: full citation, abstract, references, citings, index 

Full text available: l g] pdf(374.44 KB) terms 

Injecting binary code into a running program is a common form of attack. Most defenses 
employ a "guard the doors" approach, blocking known mechanisms of code injection. 
Randomized instruction set emulation (RISE) is a complementary method of defense, one 
that performs a hidden randomization of an application's machine code. If foreign binary 
code is injected into a program running under RISE, it will not be executable because it 
will not know the proper randomization. The pape ... 

Keywords: Automated diversity, randomized instruction sets, software diversity 

19 Memory system pe r formance o f programs with inte nsiv e heap allocation 
Amer Diwan, David Tarditi, Eliot Moss 

>^ August 1995 ACM Transactions on Computer Systems (TOCS), volume 13 issue 3 

Publisher: ACM Press 

r 11A 4 u. « ^ 0 , nym Additional Information: full citation , abstract, references , citings, index 
Full text available: ^pdf( 2.10 MB ) 

Heap allocation with copying garbage collection is a general storage management 
technique for programming languages. It is believed to have poor memory system 
performance. To investigate this, we conducted an in-depth study of the memory system 
performance of heap allocation for memory systems found on many machines. We studied 
the performance of mostly functional Standard ML programs which made heavy use of 
heap allocation. We found that most machines support heap allocation poorly. Howeve ... 

Keywords: automatic storage reclamation, copying garbage collection, garbage 
collection, generational garbage collection, heap allocation, page mode, subblock 
placement, write through, write-back, write-buffer, write-miss policy, write-policy 



20 Link-time binary rewriting techniques for prog ram com paction 
Bjorn De Sutter, Bruno De Bus, Koen De Bosschere 

September 2005 ACM Transactions on Programming Languages and Systems 

(TOPLAS), Volume 27 Issue 5 
Publisher: ACM Press 

r „ , . . a , (M „ ym Additional Information: full cit ation , abstract, references, citings, index 

Full text available: pdff 1 .3 7 MB ) 

^ terms , review 

Small program size is an important requirement for embedded systems with limited 
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amounts of memory. We describe how link-time compaction through binary rewriting can 
achieve code size reductions of up to 62&percent; for statically bound languages such as 
C, C&plus;&plus;, and Fortran, without compromising on performance. We demonstrate 
how the limited amount of information about a program at link time can be exploited to 
overcome overhead resulting from separate compilation. This is done with sc ... 

Keywords: Program representation, binary rewriting, code abstraction, compaction, 
interprocedural analysis, linker, whole-program optimization 
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